organic compound
A Transformer Based Generative Chemical Language AI Model for Structural Elucidation of Organic Compounds
For over half a century, computer-aided structural elucidation systems (CASE) for organic compounds have relied on complex expert systems with explicitly programmed algorithms. These systems are often computationally inefficient for complex compounds due to the vast chemical structural space that must be explored and filtered. In this study, we present a proof-of-concept transformer based generative chemical language artificial intelligence (AI) model, an innovative end-to-end architecture designed to replace the logic and workflow of the classic CASE framework for ultra-fast and accurate spectroscopic-based structural elucidation. Our model employs an encoder-decoder architecture and self-attention mechanisms, similar to those in large language models, to directly generate the most probable chemical structures that match the input spectroscopic data. Trained on ~ 102k IR, UV, and 1H NMR spectra, it performs structural elucidation of molecules with up to 29 atoms in just a few seconds on a modern CPU, achieving a top-15 accuracy of 83%. This approach demonstrates the potential of transformer based generative AI to accelerate traditional scientific problem-solving processes. The model's ability to iterate quickly based on new data highlights its potential for rapid advancements in structural elucidation.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > United States > Massachusetts > Middlesex County > Acton (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- (2 more...)
Syntactically Robust Training on Partially-Observed Data for Open Information Extraction
Qi, Ji, Chen, Yuxiang, Hou, Lei, Li, Juanzi, Xu, Bin
Open Information Extraction models have shown promising results with sufficient supervision. However, these models face a fundamental challenge that the syntactic distribution of training data is partially observable in comparison to the real world. In this paper, we propose a syntactically robust training framework that enables models to be trained on a syntactic-abundant distribution based on diverse paraphrase generation. To tackle the intrinsic problem of knowledge deformation of paraphrasing, two algorithms based on semantic similarity matching and syntactic tree walking are used to restore the expressionally transformed knowledge. The training framework can be generally applied to other syntactic partial observable domains. Based on the proposed framework, we build a new evaluation set called CaRB-AutoPara, a syntactically diverse dataset consistent with the real-world setting for validating the robustness of the models. Experiments including a thorough analysis show that the performance of the model degrades with the increase of the difference in syntactic distribution, while our framework gives a robust boundary. The source code is publicly available at https://github.com/qijimrc/RobustOIE.
- North America > United States > California > San Diego County > San Diego (0.04)
- North America > Canada (0.04)
- Europe > United Kingdom > England (0.04)
- (3 more...)
Building a Better Nose
For dog lovers, the idea of friendly canines as living, breathing, tail-wagging cancer detectors is a hopeful one. Not only do dogs conjure smiles, but their known olfactory abilities would offer a strange contrast to the sterile medical exam rooms many find dreadful: brushed steel countertops, white lab coats, buttercup walls, and the penetrating smell of disinfectants. But if dogs have already shown the ability to detect cancer on human breath or urine, researchers have now found one better: Ants could be a more cost-effective means of harnessing the same super-sniffing abilities of their distant cousin canines to help detect cancer and other illnesses in humans. We may eventually be able to use both dogs and ants to train artificial intelligence-powered devices to do the same thing. "Insects have a life that is much shorter than that of mammals. They have to learn fast," says Patrizia d'Ettorre, an expert in ant behavior at University Paris 13 in France.
- Europe > France (0.25)
- North America > United States > Indiana > Marion County > Indianapolis (0.05)
Machine learning identification of organic compounds using visible light
Bikku, Thulasi, Fritz, Rubén A., Colón, Yamil J., Herrera, Felipe
Identifying chemical compounds is essential in several areas of science and engineering. Laser-based techniques are promising for autonomous compound detection because the optical response of materials encodes enough electronic and vibrational information for remote chemical identification. This has been exploited using the fingerprint region of infrared absorption spectra, which involves a dense set of absorption peaks that are unique to individual molecules, thus facilitating chemical identification. However, optical identification using visible light has not been realized. Using decades of experimental refractive index data in the scientific literature of pure organic compounds and polymers over a broad range of frequencies from the ultraviolet to the far-infrared, we develop a machine learning classifier that can accurately identify organic species based on a single-wavelength dispersive measurement in the visible spectral region, away from absorption resonances. The optical classifier proposed here could be applied to autonomous material identification protocols or applications.
How AWS uses graph neural networks to meet customer needs
Graphs are an information-rich way to represent data. A graph consists of nodes -- typically represented by circles -- and edges -- typically represented as line segments between nodes. In a knowledge graph, for instance, the nodes represent entities, and the edges represent relationships between them. In a social graph, the nodes represent people, and an edge indicates that two of those people know each other. At Amazon Web Services, the use of machine learning (ML) to make the information encoded in graphs more useful to our customers has been a major research focus.
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
- Information Technology > Services (0.90)
High-throughput discovery of chemical structure-polarity relationships combining automation and machine learning techniques
Xu, Hao, Lin, Jinglong, Liu, Qianyi, Chen, Yuntian, Zhang, Jianning, Yang, Yang, Young, Michael C., Xu, Yan, Zhang, Dongxiao, Mo, Fanyang
As an essential attribute of organic compounds, polarity has a profound influence on many molecular properties such as solubility and phase transition temperature. Thin layer chromatography (TLC) represents a commonly used technique for polarity measurement. However, current TLC analysis presents several problems, including the need for a large number of attempts to obtain suitable conditions, as well as irreproducibility due to non-standardization. Herein, we describe an automated experiment system for TLC analysis. This system is designed to conduct TLC analysis automatically, facilitating high-throughput experimentation by collecting large experimental data under standardized conditions. Using these datasets, machine learning (ML) methods are employed to construct surrogate models correlating organic compounds' structures and their polarity using retardation factor (Rf). The trained ML models are able to predict the Rf value curve of organic compounds with high accuracy. Furthermore, the constitutive relationship between the compound and its polarity can also be discovered through these modeling methods, and the underlying mechanism is rationalized through adsorption theories. The trained ML models not only reduce the need for empirical optimization currently required for TLC analysis, but also provide general guidelines for the selection of conditions, making TLC an easily accessible tool for the broad scientific community.
- Asia > China > Beijing > Beijing (0.04)
- Asia > China > Guangdong Province > Shenzhen (0.04)
- North America > United States > Ohio (0.04)
- (3 more...)
Classification via an Embedded Approach
Rubio, Jose de Jesus, Avila, Francisco Jacob, Melendez, Adolfo, Stein, Juan Manuel, Meda, Jesus Alberto, Aguilar, Carlos
This paper presents the results of an automated volatile organic compound (VOC) classification process implemented by embedding a machine learning algorithm into an Arduino Uno board. An electronic nose prototype is constructed to detect VOCs from three different fruits. The electronic nose is constructed using an array of five tin dioxide (SnO2) gas sensors, an Arduino Uno board used as a data acquisition section, as well as an intelligent classification module by embedding an approach function which receives data signals from the electronic nose. For the intelligent classification module, a training algorithm is also implemented to create the base of a portable, automated, fast-response, and economical electronic nose device. This solution proposes a portable system to identify and classify VOCs without using a personal computer (PC). Results show an acceptable precision for the embedded approach in comparison with the performance of a toolbox used in a PC. This constitutes an embedded solution able to recognize VOCs in a reliable way to create application products for a wide variety of industries, which are able to classify data acquired by an electronic nose, as VOCs. With this proposed and implemented algorithm, a precision of 99% for classification was achieved into the embedded solution.
- North America > Mexico > Mexico City (0.28)
- North America > United States > Massachusetts (0.14)
- Europe > Switzerland (0.14)
- (5 more...)
- Health & Medicine (0.94)
- Energy > Oil & Gas > Downstream (0.68)
- Materials > Chemicals > Industrial Gases (0.46)
- Materials > Chemicals > Commodity Chemicals > Petrochemicals (0.46)
This AI engine only needs a whiff of your breath to detect illness
Researchers at a British university are working on an artificial intelligence (AI) engine that can diagnose illness simply by smelling the breath of a person. Andrea Soltoggio, a member of the data science team at Loughborough University, said the engine is being taught how to identify a range of illness-revealing substances that humans might exhale. "Compared to that of animals, the human sense of smell is far less developed and certainly not used to carry out daily activities. For this reason, humans aren't particularly aware of the richness of information that can be transmitted through the air, and can be perceived by a highly sensitive olfactory system. AI may be about to change that," Soltoggio wrote in an article for online publication Smithsonian.com.
NASA robot finds 'building blocks for life' on Mars
A NASA robot has found more building blocks for life on Mars, the most complex organic matter yet from 3.5 billion-year-old rocks on the surface of the red planet, the US space agency said on Thursday. The unmanned Curiosity rover has also found increasing evidence for seasonal variations of methane on Mars, indicating the source of the gas is likely the planet itself, or possibly its subsurface water. The data, collected through drilling into the lowest point of the red planet's Gale crater, is part of the US space agency's newly widened search for organic molecules that could indicate past life on the surface of Mars. Additional data from the robotic probe confirms the detection of "seasonal patterns" in methane levels, NASA geophysicist Ashwin Vasvada said in the live-streamed announcement. NASA scientist Chris Webster confirmed that water has been found on the martian surface and has been present for "a very long time," which points strongly toward a "habitable environment".
- Government > Space Agency (1.00)
- Government > Regional Government > North America Government > United States Government (1.00)
IBM's AI learns how to predict the outcomes of chemical reactions
By thinking of atoms as letters and molecules as words an Artificial Intelligence (AI) from IBM is now using the same neural network techniques that other AI's use to translate between different languages to predict the outcomes of organic chemical reactions, and the breakthrough could help speed up the development of new drugs. Scientists have been trying to teach computers about chemistry for decades in the hope that one day they'll be able to help them discover and predict the outcomes of chemical reactions but organic chemicals can be extraordinarily complex, and past simulations of their behaviours have been at best time consuming and inaccurate. Now though the team at IBM, and their new AI have tried a different technique to solve this thorny problem. "Instead of translating English into German or Chinese, we had the same artificial intelligence technology look at hundreds of thousands or millions of chemical reactions and got it learn the basic structure of the'language' of organic chemistry, and then we had it try to predict the outcomes of possible organic chemical reactions," said the study's co-author Teodoro Laino from IBM Research's lab in Zurich. "We want to help chemists design new synthesis routes for organic compounds," he added.